On the Decoupling of Evolutionary Changes in mRNA and Protein Levels

Abstract Variation in gene expression across lineages is thought to explain much of the observed phenotypic variation and adaptation. The protein is closer to the target of natural selection but gene expression is typically measured as the amount of mRNA. The broad assumption that mRNA levels are good proxies for protein levels has been undermined by a number of studies reporting moderate or weak correlations between the two measures across species. One biological explanation for this discrepancy is that there has been compensatory evolution between the mRNA level and regulation of translation. However, we do not understand the evolutionary conditions necessary for this to occur nor the expected strength of the correlation between mRNA and protein levels. Here, we develop a theoretical model for the coevolution of mRNA and protein levels and investigate the dynamics of the model over time. We find that compensatory evolution is widespread when there is stabilizing selection on the protein level; this observation held true across a variety of regulatory pathways. When the protein level is under directional selection, the mRNA level of a gene and the translation rate of the same gene were negatively correlated across lineages but positively correlated across genes. These findings help explain results from comparative studies of gene expression and potentially enable researchers to disentangle biological and statistical hypotheses for the mismatch between transcriptomic and proteomic data.


Introduction
Understanding the causes and consequences of evolutionary divergence in gene expression is important for explaining divergence in organismal phenotypes and adaptation (King and Wilson 1975). As proteins carry out functions encoded by protein-coding sequences and are generally thought of as the functional unit of the cell, the protein abundance (hereafter, the protein level) is expected to be the target of natural selection. However, previous work on gene expression evolution has predominantly relied on mRNA levels due to the relative simplicity and cost-effectiveness of high-throughput mRNA sequencing methods compared to mass spectrometry-based proteomics. This implicitly assumes that mRNA levels are an adequate proxy for protein levels; however, many studies, including those on mammals, flies, and yeasts, have documented weak to moderate correlations (i.e., Pearson's correlation coefficient r or Spearman's correlation coefficient ρ below 0.6) between mRNA and protein levels across genes (Gygi et al. 1999;Marguerat et al. 2012;Becker et al. 2018;Wang et al. 2019), tissues (Edfors et al. 2016;Franks et al. 2017;Fortelny et al. 2017;Perl et al. 2017;Wang et al. 2019), and species (Laurent et al. 2010;Khan et al. 2013;Ba et al. 2022). Disentangling the biological, technical, and statistical explanations for the observed correlations between mRNA and protein levels remains an open and challenging problem (de Sousa Abreu et al. 2009;Vogel and Marcotte 2012;Liu et al. 2016;Buccitelli and Selbach 2020). In order to understand the biological underpinnings of this relationship-that is, to better understand how, when, and why discrepancies between mRNA and protein levels arise on evolutionary timescales-we need mathematical models that describe the coevolution between these two aspects of gene expression and that can generate clear predictions.
In this study, we focus specifically on the correlation between mRNA and protein levels of the same gene across species. In addition to a weak to moderate correlation between mRNA and protein levels, protein levels are generally more conserved than mRNA levels across species (Schrimpf et al. 2009;Laurent et al. 2010;Khan et al. 2013). This phenomenon is hypothesized to be due to compensatory evolution, in which changes to the mRNA level can be offset by changes to translation regulation, and vice versa (Schrimpf et al. 2009;Laurent et al. 2010;Khan et al. 2013;Wang et al. 2020). Consistent with the compensatory evolution interpretation is the observed negative correlation between the evolutionary divergence of mRNA levels and that of translational efficiencies (i.e., per-transcript rate of translation, as measured by ribosome profiling). For instance, the difference between two yeast species, Saccharomyces cerevisiae and S. paradoxus in the mRNA level and that in translational efficiency of the same gene is more frequently in opposite directions than in the same direction (Artieri and Fraser 2014;McManus et al. 2014). Similarly in mammals, the amount of divergence across five species from diverse clades (eutherian mammals, marsupials, and monotremes) in the mRNA level and that in translational efficiency are negatively correlated (Wang et al. 2020). However, the observed negative correlation between the mRNA level and the translational efficiency may be attributable to a statistical artifact, as translational efficiency estimated using ribosome profiling data is a ratio of the total translation level and the mRNA level, and regressing Y/X (or Y − X after log transformation) against X is well known to induce spurious negative relationships when there is measurement error in one, or both, of the variables (Long 1980;Fraser 2019;Ho and Zhang 2019). Indeed, other studies find little evidence of compensatory evolution between transcription and translation, with changes to translation largely mirroring changes to transcription (Albert et al. 2014;Wang et al. 2018). The conflicting observations regarding the coevolution of transcription, translation, and protein levels raise the questions as to what evolutionary conditions are likely to result in compensatory evolution and whether these conditions are likely prevalent.
It is challenging to develop statistical tests to disentangle evolutionary from technical explanations. The proposed evolutionary models have been purely verbal, post hoc explanations of observed patterns, and there is not a strong theoretical foundation from which to make more quantitative predictions. More rigorous theory would also aid in the interpretation of phylogenetic models fit to comparative gene expression data. There has been a rapid increase in the availability of both comparative gene expression data and in the sophistication of statistical approaches for analyzing them (e.g., Bedford and Hartl 2009;Brawand et al. 2011;Rohlfs et al. 2014;Rohlfs and Nielsen 2015;Nourmohammad et al. 2017;Chen et al. 2019;Barua and Mikheyev 2020). However, we typically lack strong predictions about how different evolutionary processes will change the observed phylogenetic distribution of expression levels (Price et al. 2022). Furthermore, we anticipate that in the next several years, researchers will collect transcriptomic data paired with proteomic data from multiple lineages (e.g., Ba et al. 2022) and we would like to be able to develop predictions and quantitative models to describe patterns of coevolution from these data.
To address this lack of theoretical justification, we develop a framework rooted in quantitative genetics theory to investigate the coevolution of the mRNA level, the rate of translation, and the protein level of a gene when the protein level is the target of natural selection. Under this framework, we conducted simulations to demonstrate how patterns of evolutionary divergence in these traits and correlation between them are related to values of evolutionary parameters. Our simulations reveal that stabilizing selection on the protein level is sufficient to cause compensatory evolution, which holds across a variety of evolutionary conditions. We also find that evolutionary changes in the mRNA level and that in the rate of translation complement each other when the protein level is under directional selection, resulting in a negative transcription-translation correlation that is similar to that caused by stabilizing selection.

Stabilizing Selection Leads to Compensatory Evolutionary of Transcription and Translation
To understand the coevolutionary dynamics of mRNA and protein levels when the latter is subject to natural selection, we considered a model where the mRNA level and the per-transcript rate of translation (more concisely, the "translation rate") are directly affected by mutations and collectively determine the protein level. The protein level is the fitness-related trait, and we first considered the case when the protein level is subject to stabilizing selection. Under this model, we simulated the evolution of a gene's expression in 500 replicate lineages and examined the resulting distribution of phenotypes among these lineages (see Materials and Methods). This model assumes that mRNA and protein degradation rates remain constant through time, such that all evolutionary changes to mRNA and protein levels are mediated through changes to transcription and/or translation. Here, each replicate lineage can be conceived as a species, and the amount of evolutionary divergence among species was represented by the variance among these lineages. As a negative control, we also simulated mRNA and protein levels when neither trait is subject to natural selection, that is, neutral evolution.
When the protein level is under stabilizing selection, the correlation between the mRNA level and the translation rate of the same gene among species (the "transcription-translation correlation") were strongly negative (r < −0.95 under the parameter combination we considered, fig. 1A). In contrast, such a negative transcriptiontranslation correlation was not observed under neutrality (supplementary fig. S1A, Supplementary Material online). The positive correlation between the mRNA level and the protein level (the "mRNA-protein correlation") was much weaker under stabilizing selection compared to that under neutrality ( fig. 1B and supplementary fig. S1B, Supplementary Material online). These patterns are also robust to relative mutational target sizes of transcription and translation, though the mRNA-protein correlation  Decoupling of mRNA and Protein · https://doi.org/10.1093/molbev/msad169 MBE becomes stronger when the proportion of mutations that affect the mRNA level is high (supplementary table S1, Supplementary Material online). As measurement errors are known to impact empirical measures of gene expression and mRNA-protein correlation (Csárdi et al. 2015), we added noise to each end-point (i.e., at the end of the simulations, equivalent to measurements from multiple extant species) mRNA level and protein level (see Materials and Methods) and repeated our analyses to determine if our results were robust. In general, measurement error had little qualitative impact on our results (supplementary table S2, Supplementary Material online). The dependence of measurement error in the translation rate on measurement error in the mRNA level (see Materials and Methods) created a trend towards a negative correlation between the mRNA level and the translation rate under neutrality. In contrast, measurement error did not strengthen, but weakened, the negative transcription-translation correlation in the presence of stabilizing selection (supplementary table S2, Supplementary Material online). This is consistent with attenuation bias towards 0, which is known to occur when two correlated traits are measured with error.
When we examined how the variance of each trait across replicate lineages changed over time, we observed patterns that are consistent with these findings from looking at the correlation of end-points ( fig. 1A and B). Variances of both the mRNA level and the translation rate increased over time when the protein level was subject to stabilizing selection, although the variances of each trait were much lower than the neutral expectation To confirm that a negative transcription-translation correlation arises due to stabilizing selection, we varied the strength of stabilizing selection by altering the effective population size N e and SD of the fitness function σ ω . As expected, the transcription-translation correlation became more negative when selection was stronger (i.e., greater N e and/or smaller σ ω ) ( fig. 1D). In contrast, the mRNA-protein correlation was close to zero when selection was strong, but more positive when selection was weak ( fig. 1E). Together, these observations support the view that compensatory evolution between the mRNA level and the translation rate helps maintain a relatively constant protein level when protein levels are subject to stabilizing selection.
We also conducted simulations along a phylogenetic tree of 50 species ( fig. 2A) and estimated evolutionary correlations between traits (i.e., the correlation accounting for phylogenetic history) to confirm whether the above patterns would be seen in phylogenetic comparative analyses. Consistent with observations from the simulated replicate lineages, the evolutionary correlation between the mRNA level and the translation rate is strongly negative when the protein level is under stabilizing selection (r ≈ −0.8 under the parameter combinations we considered, supplementary table S3, Supplementary Material online). As this correlation is not as negative as that across replicate lineages (regression slope of the translation rate on the mRNA level b = −0.791 for simulations along a tree and b = −0.977 for simulation on replicate lineages), we repeated the simulation along trees transformed using Pagel's λ transformation (Pagel 1999) (i.e., extending external branches and shortening internal branches of the original tree) to test if the discrepancy is due to small effective sample size caused by the tree structure (Ané 2008). Indeed, evolutionary correlations estimated from the transformed trees were more similar to the correlations among replicate lineages (supplementary table S4, Supplementary Material online). We also observed a positive association of divergence time with both the mRNA level and the translation rate, while obvious saturation was seen for the protein level (fig. 2B), consistent with the time-variance relationships seen in simulations along replicate lineages ( fig. 1C). Fitting standard phylogenetic models of continuous trait evolution (see Materials and Methods) indicates that divergence of the protein level is better described by an Ornstein-Uhlenbeck (OU) process (a random walk around an optimum, commonly interpreted as a model of stabilizing selection Hansen 1997), while divergence of the mRNA level and the translation rate are better described by a Brownian motion (BM) process [a simple random walk, commonly interpreted as genetic drift or randomly varying selection (Felsenstein 1988)] (supplementary fig. S2A, C, and E, Supplementary Material online). However, when measurement error was unaccounted for, model comparisons to the mRNA level and the translation rate favored OU models as well (supplementary fig. S2A, C, and E, Supplementary Material online). This is consistent with previous findings that measurement error biases model fits toward OU models even if the traits evolved under a BM model (Pennell et al. 2015;Cooper et al. 2016).

Transcription-Translation Coevolution of Interacting Genes
Due to shared gene regulatory architecture, the expression levels of different genes are not independent of each other, and this interdependence can potentially shape the mutational architecture and influence the way evolution of expression levels is constrained (Hether and Hohenlohe 2014). Therefore, we also examined how interactions between genes influence the coevolution of mRNA and protein levels. We simulated data under a model of two interacting genes where each gene's realized transcription rate is determined collectively by its genotypic value (i.e., a baseline transcription rate) and the regulatory effect of protein product(s) of other gene(s) ( fig. 3A, also see Materials and Methods).
We first examined the scenario where one gene is only a regulator (referred to as "the regulator"), while the protein Jiang et al. · https://doi.org/10.1093/molbev/msad169 MBE level of the other gene is subject to stabilizing selection (referred to as "the target"). The target gene, which is directly under stabilizing selection, showed negative transcription-translation correlation (i.e., correlation between genotypic values of transcription and translation; fig. 3B) and weak mRNA-protein correlation ( fig. 3C) for most combinations of interaction parameter values. However, the negative transcription-translation correlation became weaker in the presence of the regulator, reflecting between-gene compensatory evolution. In other words, in the presence of a regulator, a deleterious substitution affecting the target's transcription or translation rates can be compensated by a substitution affecting transcription or translation rate of either the target itself or the regulator. In the presence of strong negative feedback (i.e., regulatory effects of two genes on each other both have large absolute values but opposite signs), the mRNAprotein correlation became more positive. The regulator also exhibited a negative transcription-translation correlation ( fig. 3D), indicating that indirect selection due to the regulatory roles of a gene is sufficient to cause its transcription-translation compensatory evolution. The mRNAprotein correlation of the regulator was weakened by the interaction but remained rather strong across the parameter space ( fig. 3E). When both genes under consideration are directly subject to stabilizing selection, we observed a negative transcription-translation across the examined parameter space, yet was weaker in the presence of strong negative feedback ( fig. 3F). Consistently, the mRNA-protein correlation is generally weak but becomes stronger when there is strong negative feedback ( fig. 3G).
Although the number of genes involved in real regulatory networks is much greater, it is impractical to evenly sample a higher-dimensional space of interaction parameter combinations. To this end, we examined a series of motifs that bear features commonly seen in real regulatory networks (supplementary fig. S4, Supplementary Material online). The correlations were mostly consistent with those of genes in two-gene regulatory motifs: the regulator(s) and the target(s) all showed negative, though not necessarily strong transcription-translation correlations, while targets subject to direct stabilizing selection showed weaker mRNA-protein correlation (supplementary table S5, Supplementary Material online). Together, these results reveal that transcription-translation compensatory evolution can take place as a result of a gene's regulatory effect on other gene(s), yet would be less pronounced due to other gene(s)' regulatory effect(s) on the gene of interest.

Transcription-Translation Coevolution of Functionally Equivalent Genes
Next, we considered a scenario where two genes express functionally equivalent proteins, such that fitness is determined by the total amount of proteins expressed from the two genes. These genes can be perceived as duplicate genes that have not yet been divergent enough in their protein sequences to be functionally distinct, in which case selection would act to maintain the total expression level (Qian et al. 2010). In simulations under this scenario, both genes show negative transcription-translation correlations (supplementary table S6

Transcription-Translation Coevolution Under Directional Selection
To understand how directional selection might influence the coevolution of mRNA and protein levels differently, we simulated evolution towards the optimal protein level in 500 replicate lineages starting from the same phenotype (see Materials and Methods). The end-point protein level is distributed around the new optimum, yet the relative contribution of mRNA level and translation to evolutionary change in the protein level varied across lineages ( fig. 4A and B). The end-point mRNA level and the end-point translation rate are negatively correlated ( fig. 4A), whereas the protein level is essentially uncorrelated with the mRNA level ( fig. 4B).
While the correlations were similar to those observed when the protein level is under stabilizing selection, variances of both the mRNA level and the translation rate are much greater among lineages that have evolved under directional selection (≈0.04 for both the mRNA level and the translation rate) than variances observed when there is stabilizing selections only (≈0.005, again for both mRNA level and the translation rate). This difference in the variances indicates the distribution of phenotypes is not only shaped by stabilizing on the protein level after the optimum is reached, but also reflects different paths different lineages took to reach the same optimal protein level.
As different genes within a genome likely have different optimal protein levels, we repeated the above simulations for multiple genes with the same starting mRNA levels and translation rates but different optimal protein levels. Specifically, we asked if patterns of correlation across species (within gene) and that across genes would be different (difference between two types of correlation illustrated in fig. 4C). Patterns of correlations across species and across genes are drastically different: for each gene, there is a strong negative correlation between the mRNA level and the translation rate, but essentially no correlation between the mRNA level and the protein level. In contrast, both correlations are positive when compared among genes ( fig. 4D and E; supplementary table S7, Supplementary Material online). These observations reflect the fact that evolutionary changes in a gene's mRNA level and translation rate were usually concordant (i.e., changing the protein level in the same direction), despite the negative correlation in terms of magnitude. Among-species variances of both the mRNA level and the translation rate are positively correlated with the absolute distance between the optimal protein level and the ancestral protein level (r > 0.99 for both the mRNA level and the translation rate), confirming the role of directional selection in shaping the distribution of phenotypes among species.

Discussion
In this study, we demonstrate how the mRNA level, the translation rate, and the protein level coevolve when it is the protein level that is subject to selection. Using simulated data generated under a quantitative genetics model of gene expression evolution, we show that stabilizing selection on the protein level can cause a negative transcription-translation correlation ( fig. 1A) and weaken the positive mRNA-protein correlation ( fig. 1B). As measurement errors are known to impact empirical estimates of gene expression, we examined the impact of random error added to the end-point traits on our results. While measurement error weakened the negative transcription-translation correlation under neutrality, it does not account for the strong correlation observed in the presence of stabilizing selection on the protein level (supplementary table S2, Supplementary Material online). Notably, as errors in the translation rate are correlated with errors in the mRNA level, a spurious correlation between estimates of these two traits might arise. Our simulations also reveal that stabilizing selection on the protein level can make the protein level more conserved across species than the mRNA level (figs. 1C and 2B), which is a pattern often found in empirical studies (Schrimpf et al. 2009;Laurent et al. 2010;Khan et al. 2013). However, we note that this phenomenon is expected to occur only under some combinations of evolutionary parameters (see the Expected Phenotypic Variances subsection of the Materials and Methods section).
The data points in fig. 1A, which correspond to different lineages, can also be interpreted as representing different genes. In this case, the end-point phenotypes shall be labeled as divergence from the ancestral phenotype. Such negative correlation between evolutionary divergence in transcription and translation of different genes has been observed in empirical studies of both budding yeasts (Artieri and Fraser 2014;McManus et al. 2014) and mammals (Wang et al. 2020). The negative correlations observed in these empirical studies shown in were weaker than those observed in our simulations, likely because different genes have different evolutionary parameters (e.g., protein levels of different genes are not under equally strong selection).
We observed that the evolution of the mRNA level and the translation rate, which are not directly under selection and have no optima, was qualitatively similar under both neutral evolution (i.e., no optimum for protein levels) and protein levels subject to stabilizing selection. Under neutrality, variance among lineages is expected to increase through time at a rate determined by the mutational variance (Lynch and Hill 1986;Khaitovich et al. 2005). In our simulations where the protein level is under stabilizing selection, variances in the mRNA level and the translation rate among replicate lineages increased through time without saturation ( fig. 1C), and phylogenetic analysis of simulations along a phylogenetic tree favored a BM model as well, consistent with neutral expectations. Importantly, the evolution of the mRNA level and the translation rate were affected by selection, as they underwent much less evolutionary divergence than expected under neutrality. Such apparently (but not truly) neutral evolution occurs Decoupling of mRNA and Protein · https://doi.org/10.1093/molbev/msad169 MBE when a trait is not under direct selection but genetically correlated to trait(s) subjected to constraint: some mutations affecting the focal traits are purged by selection due to their deleterious effect on other trait(s), yet the focal trait itself has neither an optimal value nor boundaries, allowing it to diverge indefinitely, albeit slowly (Jiang and

MBE
Zhang 2020). It should be noted that the mRNA level and the rate of translation are presumably not truly unbounded, as resources within the cell (i.e., RNA polymerase molecules, ribosomes, ATP, etc.) are ultimately limited, though we assumed that the phenotypes are far from such limits in our simulations. Previous studies have shown evolutionary divergence in the mRNA level at phylogenetic scales is best described by an OU process (Bedford and Hartl 2009;Chen et al. 2019;Dimayacyac et al. 2023), which appears in contradiction to our observation that variance in the mRNA level continued to increase through time (figs. 1C and 2B). As mentioned above, one possible explanation for this is that measurement errors create a bias in favor of the OU model (Pennell et al. 2015;Cooper et al. 2016), which is confirmed in this study (supplementary fig. S2, Supplementary Material online). The discrepancy between our simulation results and observations from analyses of real data is likely to be due in part to measurement error. It is also worth noting that the same amount of error would cause more severe bias when the total amount of divergence is low. Therefore, stabilizing selection on the protein level does make it more likely that divergence in the mRNA level appears to fit an OU model by augmenting the influence of measurement errors. Note that there could be other, nonmutually exclusive explanations to these discrepancies. For example, there might be an upper bound to each gene's mRNA level and translation rate imposed by the availability of cellular resources. Directional selection on the protein level results in patterns of correlations that are similar to those resulting from stabilizing selection. To reach the optimal protein level, the mRNA level and the rate of translation undergo evolutionary changes that are concordant in terms of direction, but complementary in terms of magnitude. As a result, negative transcription-translation correlation and weak mRNA-protein correlation are both expected among a group of species that underwent selection towards the same optimal protein level. After the optimum is reached, stabilizing selection takes over and continues to promote the same kind of correlation. The effects of stabilizing and directional selection can be collectively viewed as the effect of the fitness landscape: when there exists an optimal protein level, selection is expected to result in a negative correlation between the mRNA level and the translation rate, and weak to no correlation between the mRNA level and the protein level.
The negative transcription-translation correlation and weak mRNA-protein correlation resulting from selection on the protein level are within gene and across species. We also extended our model to explore how the same evolutionary processes would shape the correlations across different genes. We show strongly positive transcriptiontranslation and mRNA-protein correlations among genes with different optimal protein levels, demonstrating an instance of Simpson's paradox (i.e., the correlation between variables seen in certain subsets of data differs from that seen in the complete dataset). Recent studies have found rather strong mRNA-protein correlations across genes (e.g., Franks et al. 2017;Fortelny et al. 2017), and it is suggested that measurement errors played a significant role in weakening the correlations in earlier studies (Csárdi et al. 2015). Our finding reconciles these results and the weaker within-gene, among-species correlations.
In our simulations, we considered a simple mutational architecture with no pleiotropy (i.e., each mutation affects either transcription or translation but not both) as parameterization of pleiotropy in the simulations is challenging. The prevalence of pleiotropic mutations and their effects on transcription and translation are unclear. If the mRNA level and the rate of translation could be measured simultaneously for a sufficiently large number of mutant genotypes, a more complete picture of the two traits' mutational architecture could be obtained, which would allow better parameterization. Similarly, we did not consider mutations affecting the degradation rates due to the difficulty of parameterization.

Conclusion
By connecting patterns in the coevolution of the mRNA level, the rate of translation, and the protein level to explicit evolutionary processes, we demonstrate that several widely observed phenomena in between-species comparisons-namely, weak mRNA-protein correlation, negative transcription-translation correlation, and the protein levels being more evolutionarily conserved than the mRNA level-can all result from stabilizing selection on the protein level. Additionally, positive mRNA-protein correlations across genes arise because different genes have different optimal protein levels. With these connections built, our results can aid the interpretation of observation in future empirical studies and help disentangle the effects of biological and technical factors.

Model of Gene Expression
The basic model we study is a system of two equations. The first describes the rate at which the mRNA level of a single gene, R changes with time. This is given by where α is the gene's transcription rate, and γ R is the rate at which the mRNA molecules are degraded. The rate at which the protein level, P changes with time is described as where β is the per-transcript translation rate, and γ P is the rate by which protein products are degraded γ P . The β parameter can be interpreted as the rate of translation initiation, as initiation has been shown to be a major rate-limiting step of translation (Shah et al. 2013). At the equilibrium state (i.e., neither the mRNA level nor the protein level is changing), Decoupling of mRNA and Protein · https://doi.org/10.1093/molbev/msad169 We can solve equation (3) and obtain which can then be log-transformed In our model, we assume a precise match between the genotypic values (i.e., α and β) and the phenotype (i.e., equilibrium R and P) and did not consider expression noise. Although expression noise could play a role in constraining evolution of transcription and translation rates (Hausser et al. 2019), we assume, in this study, that selection imposed by noise is far weaker than that imposed by the mean protein level and thus negligible.

Expected Phenotypic Variances
Under the assumption that degradation rates in equation (5) where Cov( ln α, ln β) is the covariance between ln α and ln β, and ρ is their correlation coefficient. The protein level is more conserved than the mRNA level if If we re-arrange the inequality, we obtain the conditions under which we would expect to see the more inter-lineage variation in R than P We note that the inequality in equation (8) is invalid when ���������� Var( ln β) > 2 ���������� Var( ln α) √ since ρ is only defined between −1 and 1, which means the protein level can be more conserved than the mRNA only if variance in the rate of translation is not too much higher than variance of the mRNA level. The variances [Var( ln β) and Var( ln α)] are depending on multiple factors, including strength of selection on the protein level, mutational parameters, and indirect effect of selection due to interaction with other gene(s) (see below). Therefore, the observation the protein level is more conserved than the mRNA level reflects the collective effect of multiple factors and may not occur depending on value of different parameters.

Model of Interaction Between Genes
Let us consider two interacting genes, gene 1 and gene 2. Gene 1's protein can influence gene 2's transcription, and vice versa. We assume that the expression level of gene 1 and gene 2 are governed by the dynamics depicted above, with each gene having its own genotypic values for the translation and transcription rate parameters (i.e., the α value for gene 1 is α 1 and for gene 2 is α 2 ). We assume that the rates of mRNA and protein degradation are the same for all genes. In this model, the two genes interact according to a set of interaction parameters C. The parameter C 1,2 is the effect of gene 1's protein product P 1 on gene 2's transcription level, and C 2,1 is the reverse. We can then set up a system of differential equations as follows: When the interaction parameter is negative, the regulatory effect on the target gene is repression. This is reflected in the asymptotic decrease of the realized transcription rate (α 1 P C 2,1 2 or α 2 P C 2,1 1 ) as the concentration of the repressor increases. Conversely, a positive interaction parameter indicates an activation effect, where the realized transcription rate increases with the concentration of the activator.
It is worth noting that the activation effect would plateau as the activator's concentration increases, because an excess of activator molecules would not be able to bind the target. Although the realized transcription rate increases monotonously in our model, our approximation remains reasonable when the interaction parameter is between 0 and 1 and the regulator's expression level is not extremely high. Given our focus on scenarios where stabilizing selection is in action, we can assume that the concentration of the regulator's protein would never reach a level where target's transcription rate plateaus. We did not consider interaction parameter values above one in this study.
At equilibrium, the following four equations hold: After log-transformation and rearrangement, we can express equation (10) in matrix form as follows: Jiang et al. · https://doi.org/10.1093/molbev/msad169 Solving the system of linear equations gives the logarithms of R 1 , R 2 , P 1 , and P 2 . This model can be extended to systems of three or more genes. For gene i in a system of n genes, we have the following differential equations: Again, after log-transformation and rearrangement, we can express equation (12) in matrix form as follows: 0 C 1,3 0 C 2,3 −1 0 · · · 0 C n,3 0 0 0 0 1 −1 · · · 0 0 · · · · · · · · · · · · · · · · · · · · · · · · · · · 0 C 1,n 0 C 2,n 0 C 3,n · · · −1 0 Note that the system may not have a solution (i.e., the leftmost matrix may not be invertible), depending on values of the parameters. A biological mechanism underlying such scenarios is positive feedback: if a set of genes activate each other, and their initial expression levels are high enough such that the degradation cannot counteract the increase of their expression, there would not be an equilibrium. Instead, their expression levels would increase indefinitely until a physical barrier is reached (e.g., limited by availability of ribosomes). When we simulated evolution of two interacting genes (see below), we considered all combinations of the following values for C 1,2 and C 2,1 : −0.8, −0.6, −0.4, −0.2, 0, 0.2, 0.4, 0.6, and 0.8. When we simulated evolution of three interacting genes, we had absolute values of all nonzero interaction parameters equal to 0.5. The set of triple-gene regulatory motif examined in this study were chosen because they bear features commonly seen in real regulatory networks (Lee et al. 2002;Shen-Orr et al. 2002). Specifically, motif 1 is a simple regulator chain, motif 2 is a negative feedback loop, motifs 3-5 represent regulatory motifs where the same target is regulated by more than one regulators, while motifs 6-8 represent motifs where one regulator (i.e., transcription factor) regulates more than one targets (see supplementary fig. S4, Supplementary Material online for graphical depictions of the motifs).

Simulation of Evolution Along a Single Lineage
For a system of n genes, we considered a total of 2n traits that are directly affected by mutations, including logtransformed genotypic values of their transcription rates (ln α 1 , . . . , ln α n ) and translation rates (ln β 1 , . . . , ln β n ), and simulated the evolution of the population mean phenotype through time. Traits that are directly under selection are the equilibrium protein levels (ln P 1 , . . . , ln P n ). Given the genotypic values, the steady-state protein levels are calculated using equation system (5) when only a single gene is under concern, using equation (11) for a system of two genes, or equation (13) for a system of three or more genes. As the degradation terms in equation (5) are constants, we omitted them in the simulations when only a single gene is considered; that is, ln R is represented by ln α, and ln P is represented by ln α + ln β. The total number of mutations that would occur in time step t, denoted m t , is drawn from a Poisson distribution. The mean of the distribution, E[m t ], is given by where N e is the effective population size, U is the total per-genome rate of mutations affecting the transcription and translation rates, while U α i and U β i are rates of mutations affecting gene is transcription and translation rates (i.e., number of mutations per haploid genome per time step), respectively. It is assumed here that a mutation can affect either transcription or translation rate of one gene, but not both. The probability that a mutation affects the transcription rate of gene i is U α i / U, and the probability that it affects the translation rate of gene i is U β i / U. The phenotypic effect of a mutation affecting a gene is transcription rate (ln α i ) is drawn from a normal distribution N (0, σ α i ). Similarly, if the mutation affects the translation rate of gene i (ln β i ), its effect is drawn from another normal distribution, N (0, σ β i ). Given the protein level of a gene i, Decoupling of mRNA and Protein · https://doi.org/10.1093/molbev/msad169 MBE the fitness ω with respect to its protein level is given by a Gaussian function: where O is the optimal protein level and σ ω is the SD of the fitness function (also referred to as width of the fitness function). In a system where protein levels of n genes are subject to selection, the overall fitness is calculated as When n = 1, equation (16) gives the same result as (15). If there is any gene that has no equilibrium phenotype, the fitness would be treated as zero, though such situations did not occur in our simulations. The coefficient of selection, s, is then calculated as s = (ω/ω A ) − 1, where ω A is the ancestral fitness. The fixation probability p f of the mutation is calculated following the approach of Kimura (1962): With probability p f , the mutation's phenotypic effect would be added to the population's mean before the next mutation is considered. The above process would be repeated for T times for each lineage. We set T = 10 5 , N e = 1000, U α = U β = 5 × 10 −4 (i.e., 2N e U = 1), σ α = σ β = 0.1, ln O = 0, and σ ω = 1 for all simulations, unless specified. The choice of N e and σ ω was initially arbitrary, but turned out sufficient for the effect of stabilizing selection to manifest: as seen in fig. 1D and E, correlations under this combination is very different from those under weak selection (i.e., lowerleft corners of the heat maps) yet similar to many other combinations of relatively strong selection. Therefore, we chose to use these values for most of the simulations. When we simulated a single gene that is subject to directional selection, we set ln O = 1. All simulations started with ln α = 0 and ln β = 0, unless specified. For each combination of parameter values, we simulated 500 independent lineages.
When we performed simulations using different combinations of U α and U β , we had a constant total mutation rate U α + U β = 10 −3 and let the relative contribution of mutations affecting transcription and those affecting translation vary. Values of U α and U β were chosen such that the expected proportion of mutations that affect transcription (U α /(U α + U β )) is equal to 10%, 20%, 30%, 40%, 50% (the "default" combination), 60%, 70%, 80%, and 90% (supplementary table S1, Supplementary Material online).
For simulations where multiple genes with different optimal protein levels were involved, we randomly sampled a set of 20 optima ln O ∼ N (0, 2). For each gene, we conducted simulation along 500 replicate lineages. We assumed no linkage between these genes and had them evolve independently (i.e., simulations for different genes were run separately).
For simulations of functional equivalent genes, fitness is calculated as where P 1 and P 2 are two genes' respective protein levels. In these simulations, the two genes are assumed to be unlinked and independently regulated (i.e., each mutation can affect either transcription or translation of only one gene).
It should be noted that our simulations were based on a sequential-fixation model of evolution, which allows more efficient simulations. That is, only one mutation is considered each time, and fixation probability of a mutation is calculated after it is determined if the previous mutation being considered is fixed. Such a model can be a good approximation as long as the mutation rate is not too high such that the probability that multiple mutations affecting the trait of interest segregate in the population at the same time is very low (Lynch 2020).

Simulation Along a Phylogenetic Tree
We generated a Yule tree (no extinction) and used this for all subsequent simulations ( fig. 2A). We then rescaled the tree to make its height (i.e., distance between the root and each tip) equal to 10 5 . We simulated evolution along each branch following the same procedure as above. The number of time steps is equal to the branch length (rounded down to the nearest integer). As this is purely for illustrative purposes (the model is the same as the case studied above), we used a single, representative tree. The qualitative patterns did not depend on the shape of the tree.
For each simulation, we estimated the evolutionary variance-covariance matrix between lineages. To assess the influence of tree structure, we used a λ transformation (Pagel 1999) to rescale the relative length of terminal branches with tree height kept the same. We fit two models, BM and OU to each trait (the mRNA level, the translation rate, and the protein level) for each simulation and compared the relative support of the models using their samplesize corrected AICc weights (following Pennell et al. 2015) and averaged the AICc weights across replicate simulations. The BM model describes a continuous-time random walk process where the amount of change in the population mean ̅ z of a trait over a time interval t is given by: where σ is the rate parameter of the process and dW is a drawn from N (′, ⊔). Under an OU process Δ̅ z is described by: where θ is the optimum value and α is the strength of attraction towards the optimum. All phylogenetic analyses were conducted using geiger (Pennell et al. 2014).

Adding Measurement Errors to Simulated Data
For each sample (i.e., an independent lineage or a tip of the phylogenetic tree), we added errors to the end-point mRNA and protein levels; the errors ε were normally distributed and centered on the true value [i.e., ε ∼ N (0, σ ϵ )]. We considered values of σ ϵ = {0.01, 0.02,0.03, 0.04, 0.05} in this study for both mRNA and protein levels. The "estimated" translation rate is calculated as a ratio of the "measured" protein and mRNA levels, such that error in the translation rate is correlated with error in the mRNA level.

Supplementary Material
Supplementary data are available at Molecular Biology and Evolution online.